-
Notifications
You must be signed in to change notification settings - Fork 1
Performance diagnostics #353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Tracks individual phase durations for statement processing: - parse, encrypt, server_write, server_wait, server_response, client_write, decrypt Enables detailed performance breakdown in diagnostic logs.
Captures statement-level metadata: - statement_type (INSERT/UPDATE/DELETE/SELECT/OTHER) - protocol (simple/extended) - encrypted flag and value count - param_bytes and query_fingerprint Enables labeled metrics and detailed diagnostic logs.
New log target for slow statement logging, configurable via CS_LOG__SLOW_STATEMENTS_LEVEL environment variable (default: warn).
Adds configuration options for slow statement logging: - CS_LOG__SLOW_STATEMENTS: enable slow statement logging - CS_LOG__SLOW_STATEMENT_MIN_DURATION_MS: threshold (default 2000ms) - CS_LOG__SLOW_STATEMENTS_LEVEL: log level (default warn when enabled) Simple config: just set CS_LOG__SLOW_STATEMENTS=true
SessionMetricsContext now tracks phase timing breakdown and statement metadata for diagnostic logging.
Measures time for ZeroKMS cipher initialization, which includes the network call to ZeroKMS. This helps isolate ZeroKMS latency as a bottleneck separate from local crypto operations.
…ration() accessors Convenient accessors for slow statement logging configuration.
Adds methods to Context for recording phase durations: - record_parse_duration (set once) - add_encrypt_duration/add_decrypt_duration (accumulate) - add_server_write_duration/add_server_response_duration/add_client_write_duration (accumulate) - record_server_wait_or_add_response (helper for first response vs subsequent) - update_statement_metadata for metadata updates These enable frontend.rs and backend.rs to instrument phases.
- Frontend tracks server write time - Backend splits first response wait vs response time - Backend tracks client write time
Records write_to_server duration using add_server_write_duration helper
parse_handler tracks: - Parse duration - Statement type and fingerprint bind_handler tracks: - Parameter bytes size
encrypt_literals and encrypt_params now record their duration to the session phase timing for slow-statement breakdown.
Tracks: - Parse/typecheck duration for phase breakdown - Statement type (INSERT/UPDATE/DELETE/SELECT/OTHER) - Protocol type (simple) - Query fingerprint - Multi-statement flag for simple queries
- Task 8: add_encrypt_duration and update_statement_metadata in encrypt_literals and encrypt_params - Task 9: query_handler already instrumented - Task 10: parse_handler with Extended protocol timing/metadata, bind_handler with param_bytes - Task 10a: write_to_server with add_server_write_duration
Documents: - Configuration options for slow statement logging - Slow statement log structure and interpretation - Prometheus metric labels and example queries - Common symptom/cause mapping
…tatement logging Existing duration histograms now include labels: - statement_type: insert/update/delete/select/other - protocol: simple/extended - mapped: true/false - multi_statement: true/false Slow statement logging with phase breakdown when enabled. Enables Grafana filtering by statement type and encryption status.
- Integration test scrapes /metrics and asserts statement labels - Adds diagnostics module for testing metrics endpoint - Adds reqwest HTTP client dependency for metrics scraping
- Record parse duration before encryption work starts to avoid double-counting - Update encrypt_params metadata unconditionally for slow-statement logging - Add decrypt duration recording via add_decrypt_duration() call - Remove unused re-exports from context/mod.rs - Fix needless_borrow clippy warnings in query_handler loop These changes ensure phase timings accurately reflect actual work: - parse_ms captures SQL parsing + type-checking only - encrypt_ms and decrypt_ms are tracked separately without overlap - Statement metadata updates regardless of Prometheus status
Auto-formatting applied by cargo fmt to various files.
Updates duration metric assertions to use regex matching that supports labeled metrics, ensuring quantile matches work regardless of other labels present.
Adds port 9930 to the proxy-tls service in docker-compose.yml to enable metrics collection during TLS-enabled integration tests.
Add doc comments explaining why timing data is accumulated in ExecuteContext during backend message processing and transferred to SessionMetricsContext on completion. This pattern exists because the backend operates on the execute queue without direct access to the session metrics queue.
Replace magic numbers in fetch_metrics_with_retry with documented constants: - METRICS_FETCH_MAX_RETRIES (5) - ~1 second total wait with 200ms delay - METRICS_FETCH_RETRY_DELAY_MS (200) - balance responsiveness with scrape time
Address code review suggestions: - Add warning log when bind_handler falls back to latest_session_id() - Thread session_id explicitly through write_to_server signature - Combine duplicate Option<SessionId> guards in encrypt_params
…server All callers passed None, making the parameter obsolete. Simplified to always use latest_session_id() directly.
| tracing = { workspace = true } | ||
| tracing-subscriber = { workspace = true } | ||
| uuid = { version = "1.11.0", features = ["serde", "v4"] } | ||
| reqwest = { version = "0.12", features = ["rustls-tls"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest reqwest is 0.13
| /// Return lowercase label for metrics | ||
| pub fn as_label(&self) -> &'static str { | ||
| match self { | ||
| StatementType::Insert => "insert", | ||
| StatementType::Update => "update", | ||
| StatementType::Delete => "delete", | ||
| StatementType::Select => "select", | ||
| StatementType::Other => "other", | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This label is lowercase but the serde serializer uses uppercase. Is that intentional?
| pub fn set_query_fingerprint(&mut self, sql: &str) { | ||
| use std::collections::hash_map::DefaultHasher; | ||
| use std::hash::{Hash, Hasher}; | ||
|
|
||
| let mut hasher = DefaultHasher::new(); | ||
| sql.hash(&mut hasher); | ||
| let hash = hasher.finish(); | ||
| self.query_fingerprint = Some(format!("{:08x}", hash & 0xFFFFFFFF)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the fingerprint is going to be included in logs or metrics then I'd recommend using a keyed hash/HMAC (e.g. Blake3) to generate the fingerprint. The default hasher is non-cryptographic and it would feasible to brute-force/dictionary attack to reveal statements which may contain sensitive data.
Risk
Likelihood: low
Impact: moderate
coderdan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is epic work! Query fingerprint I think should be addressed before merging.
Slow Statement Logging
Logs detailed breakdown when statement exceeds threshold (default 2s)
Labeled Metrics
Existing histograms (statements_session_duration_seconds, statements_execution_duration_seconds) get labels:
ZeroKMS Cipher Init Timing
New metric: keyset_cipher_init_duration_seconds
Configuration